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Abstract — A host of problems involve the recovery of struc- 
tured signals from a dimensionality reduced representation 
such as a random projection; examples include sparse signals 
(compressive sensing) and low-rank matrices (matrix comple- 
tion). Given the wide range of different recovery algorithms 
developed to date, it is natural to ask whether there exist 
"universal" algorithms for recovering "structured" signals from 
their linear projections. We recently answered this question in 
the affirmative in the noise-free setting. In this paper, we extend 
our results to the case of noisy measurements. 

I. Introduction 

Data compressors are ubiquitous in the digital world. They 
are built based on the premise that text, images, videos, 
etc. all are highly structured objects, and hence exploiting 
those structures can dramatically reduce the number of bits 
required for their storage. In recent years, a parallel trend has 
been developing for sampling analog signals. There too, the 
idea is that many analog signals of interest have some kind 
of structure that enables considerably lowering the sampling 
rate from the Shannon-Nyquist rate. 

The first structure that was extensively studied in this 
context is sparsity. It has been observed that many nat- 
ural signals have sparse representations in some domain. 
The term compressed sensing (CS) refers to the process 
of undersampling a high-dimensional sparse signal through 
linear measurements and recovering it from those measure- 
ments using efficient algorithms [1], [2]. Low-rankedness [3], 
model-based compressed sensing [4]-[8], and finite rate of 
innovation [9] are examples of some other structures that 
have already been explored in the literature. 

While in the original source coding problem introduced by 
Shannon [10], the assumption was that the source distribution 
is known both to the encoder and to the decoder, and hence 
is used in the code design, it was later shown that this 
information is not essential. In fact, universal compression 
algorithms are able to code stationary ergodic processes at 
their entropy rates, without knowing the source distribution 
[11]. In other words, there exists a family of compression 
codes that are able to code any stationary ergodic process at 
its entropy rate asymptotically [11]. The same result holds 
for universal lossy compression. 

One can ask similar questions for the problem of un- 
dersampling "structured" signals: How to define the class 
of "structured" signals? Are there sampling and recovery 
algorithms for the reconstruction of "structured" signals from 



their linear measurements without having the knowledge of 
the underlying structure? Does this ignorance incur a cost in 
the sampling rate? 

In algorithmic information theory, Kolmogorov complex- 
ity, introduced by Solomonoff [12], Kolmogorov [13], and 
Chaitin [14], defines a universal notion of complexity for 
finite-alphabet sequences. Given a finite-alphabet sequence 
x, the Kolmogorov complexity of x, K(x), is defined as the 
length of the shortest computer program that prints x and 
halts. In [15], extending the notion of Kolmogorov complex- 
ity to real-valued signals' by their proper quantization, we 
addressed some of the above questions. We introduced the 
minimum complexity pursuit (MCP) algorithm for recover- 
ing "structured" signals from their linear measurements. We 
showed that finding the "simplest" solution satisfying the 
linear measurements recovers the signal using many fewer 
measurements than its ambient dimension. 

In this paper, we extend the results of [15] to the case 
where the measurements are noisy. We first propose an 
updated version of MCP that takes into account that the 
measurements are a linear transformation of the signal plus 
Gaussian noise. We prove that the proposed algorithm is 
stable with respect to the noise and derive bounds on its 
reconstruction error in terms of the sampling rate and the 
variance of the noise. 

The organization of this paper is as follows. Section II 
defines the notation used throughout the paper. Section II-B 
defines Kolmogorov information dimension of a real-valued 
signal. Section III formally defines the MCP algorithm and 
reviews and extends some of the related results proved in 
[15]. Section IV considers the case of noisy measurements 
and proves that MCP is stable. Section V mentions some of 
the related work in the literature, and Section VI concludes 
the paper. Appendix A presents two useful lemmas used in 
the proofs. 



II. Definitions 



A. Notation 



Calligraphic letters such as A and B denote sets. For 
a set A, \A\ and A c denote its size and its complement, 
respectively. For a sample space SI and event set A Q CI, 

'These type of extensions are straightforward and have already been 
explored in [16]. 



1^ denotes the indicator function of the event A. Bold- 
faced lower case letters denote vectors. For a vector x = 
(xi, X2, ■ ■ ■ , x n ) G K™, its £ p and ioo norms are defined as 
\\ x \\p — S"=i \ x i\ P and Halloo = max, |xi|, respectively. For 
integer n, let /„ denote the n x n identity matrix. 

For x G [0, 1], let ((x)i, (x) 2 , . . .), {x)i G {0, 1}, denote 
the binary expansion of x, i.e., x = JZili 2 _I (x)i. The 
m-bit approximation of x, [x] m , is defined as [x] m = 
2~2iLi 2~'(x)i. Similarly, for a vector (xi, . . . , x„) G [0, 1]™, 

[x ]m — ([xi]m; ■ • • ; [Xn]m)> 

B. Kolmogorov complexity 

The Kolmogorov complexity of a finite-alphabet sequence 
x with respect to a universal Turing machine U is defined 
as the length of the shortest program on U that prints x 
and halts. 2 Let K (x) denote the Kolmogorov complexity of 
binary string x G {0, 1}* = U„>i{0, 1}™. 

Definition 1: For real-valued x = (xi, X2, ■ ■ ■ , x n ) G 
[0, 1]™, define the Kolmogorov complexity of 'x at resolution 
m as 

(x) = i^([xi] m , [x 2 ]m, • ■ • , [x„] m ). 
Definition 2: The Kolmogorov information dimension of 
vector (xi,X2, . . . , x n ) G [0, 1]™ at resolution m is defined 
as 

a. i^ [,1 "'(xi,x 2 ,...,x ra ) 
m 

To clarify the above definition, we derive an upper bound 

for K m>n . 

Lemma 1: For (xi,X2,...) G [0,1]°° and any resolution 
sequence {m n }, we have 

lim sup — < 1 . 

n—^oo n 

Therefore, by Lemma 1, we call a signal compressible, 
if limsupjj^.^ n~ 1 K m „ < 1. As stated in the following 
proposition, Lemma l's upper bound on K m „ is achievable. 
Proposition 1: Let {X,}^ ~ Unif [0, 1]. Then, 

— K^(X 1 ,X 2 ,...X n )^l 
mn 

in probability. 

III. Minimum complexity pursuit 

Consider the problem of reconstructing a vector x™ G 
[0,1]™ from d < n random linear measurements = 
Ax™. The MCP algorithm proposed in [15] reconstructs x™ 
from its linear measurements y% by solving the following 
optimization problem: 



min K^'' m (xi,. 
s.t. Ax™ = y d Q . 



(i) 



Let the elements of A G K, dx ™, A tj , be i.i.d. Af(0, l). 3 
Let x" = x"(?/o, A) denote the output of (1) to inputs y% = 
Ax™ and A. Theorem 1 stated below is a generalization of 
Theorem 2 proved in [15]. 

2 See Chapter 14 of [11] for the exact definition of a universal computer, 
and more details on the definition of Kolmogorov complexity. 
3 Note that in [15] we had assumed that A iy j ~ jV(0, l/d). 



Theorem 1: Assume that x a = (x 0j i,x .2, . . .) G [0, 1]°°. 
For integers m and n, let n m , n denote the Kolmogorov 
information dimension of x" at resolution m. Then, for any 
r„ < 1 and t > 0, we have 



> Vnd-i+f + 1 + 1^-^^ 



2 lOR 



Theorem 1 can be proved following the steps used in the 
proof of Theorem 2 in [15]. To interpret this theorem, in 
the following we consider several interesting corollaries that 
follow from Theorem 1. Note that in all of the results, the 
logarithms are to the base of Euler's number e. 

Corollary 1: Assume that x a = (x ,i, x 0y 2, ■ ■ ■) G [0, 1]°° 
and m = m n = [logn]. Let n n = n m „.n- Then if d n — 
\k u log n] , for any e > 0, we have P 
as n — > oo. 

Proof: For m = m n = [log n\ and d„ = \n n log n~\ 



X™|| 2 > 6) -> 0, 



(\Znd- 1 + t + 1 + l)Vn2- 2m ^+ 2 

< 2 (yiKnlogn]- 1 + (t + l)n- 1 + VriFA 



(2) 



Hence, fixing t > and setting r„ = r = 0.1, for any e > 
and large enough values of n we have 



(Vnd- 1 + t + 1 + l)\/?i2- 2 ">+ 2 



< e. 



Therefore, for n large enough, 

p fll „n A" II 2 



x, 



> e 



< e 1AK " 



^(l-r 2 +21ogr) + 



e 2 



-1.7Kn log n 



(3) 



which shows that as n — ^ oo, P(||x" 



An II 2 



0. 



□ 

According to Corollary 1, if the complexity of the signal is 
less than k, then the number of linear measurements needed 
for asymptotically perfect recovery is, roughly speaking, at 
the order of k log n. In other words, the number of measure- 
ments is proportional to the complexity of the signal and 
only logarithmically proportional to its ambient dimension. 

Corollary 2: Assume that x a = (x 0j i,x 0i 2, . . .) G [0, 1]°° 



and m = m n = [log n] . Let n r , 
d n = [3k„], we have 



Then, if d 



as n — > oo, for any e > 0. 
Proof: Setting T n = n~ - 5 , m = m n = 
d n = [3k„] in Theorem 1, it follows that 



*"l|2>e) ^0, 

[logn], and d 



1 



> 2 



_ dn' + (t 
^ 2 2K -n log?i c 1.5fc„(l-Ti _1 -logri) I g— | 
_ -(1.5-2 log 2)re n logn+K„(1.5-1.5n" 1 ; 



Since 1.5 — 2 log 2 > 0, for any e > and n large enough, 
we have 



Let e™ = - [x^} m and e™ = x™ - 



; "] m denote the 



2y fi^ 1 + (t + ljn- 1 + 2v / n- T < e. 
It follows that P(-^||x™ - i"|| 2 > e) -> 0, as n -> oo. 

□ 

In other words, if we are interested in the normalized 
mean square error, or per element squared distance, then 3n n 
measurements are sufficient. 

IV. Stability analysis of MCP 

In the previous section we considered the case where the 
signal is exactly of low complexity and the measurements 
are also noise-free. In this section, we extend the results to 
noisy measurements, where y d = Ax™ + w d , with w d ~ 
AA(0, a 2 Id)- Assuming that the complexity of the signal is 
known at the reconstruction stage, we consider the following 
reconstruction algorithm: 



argmin \\Ax n - y d f 2 , 

X 71 

s.t. K^ m (x n ) < K m , n m. 



(4) 



Note that K m , n m is an upper bound on the Kolmogorov 
complexity of x Q at resolution m. The major issue of this 
section is to calculate the number of measurements required 
for robust recovery in noise. 

Theorem 2: Assume that x a = (x 0y i,x . 2 , . . .) € [0, 1]°°. 
For integers m and n, let n m ^ n denote the information 
dimension of x™ at resolution m. If m = m n — [logn] 
and d = 8rK m , n m, where r > 1, then for any e > 0, we 
have 

(2K m! „m)cr 2 ' 



b o 112 



> 



pd 



0. 



(5) 



as n 



oo, where p = (1 — 



-1\2 



/2- 



According Theorem 2, as long as d > 8rn n log n the 
algorithm is stable in the sense that the reconstruction 
error is proportional to the variance of the input noise. By 
increasing the number of measurements one may reduce the 
reconstruction error. 

Proof: Since by definition K^ m (x„) = k m _ n m n , x™ is 
also a feasible point in (4). But, by assumption, x" is the 
solution of (4). Therefore, 



\\Ax n -y d \\ 2 2 


<\\Ax n -y d \\l 






= \\Ax n -Ax n -w d \\% 


= \VA\l (6) 


Expanding \\A£™ 


-V d o\\l = \\Ax n -Ax n 


- w d \\ 2 2 in (6), it 


follows that 






\\A{xZ-x n )\\l- 


V\\w d \\l-2{w d ) T A{x- 


-x n )<\\w d \\l 






(7) 


Canceling | \w d \ \ 2 


from both sides of (7), we obtain 


u(x: - 


■x n )\\l<2(w d ) T A(x n - 






<2\(w d ) T A(x n 





quantization errors of the original and the reconstructed sig- 
nals, respectively. Using these definitions, and the Cauchy- 
Schwartz inequality, we find a lower bound for ||j4(x™ — 
* n o)\\l as 



\\A{x^-x n )\\l 

= \\A([x»] m + e» t -[x n ] m -e 

= — Kim) + A(e%, 

>\\A([x n ] m -[x n ] m )\\ 2 

-2\(e: n -elfA T A([x n ]„ 

> \\A({x n } m -[x n ] m )g 
-2\\A(el l -e^)\\ 2 \\A([x:] 



Oil! 



(8) 
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On the other hand, again using our definitions plus the 
Cauchy-Schwartz inequality, we find an upper bound on 

\{w d ) T A{x™ - a#)| as 

\(w d ) T A(x n -x n )\ 

= ( [%o } m ~ \ x o } m + e m — e m ) A w \ 



< - K] m fA T w d \ + |(C - e: n ) T A T w d \ 



) T A T w d \ + \\ 



\A T i 



(9) 



For any x £ [0, 1], < x - [x] m < 2~ m . Therefore, 

||C-C|| 2 < Vn2-2m+2. (10) 

Let set S be the set of all vectors of length n that can be 
written as the difference of two vectors with complexity less 
than fc m n m; that is, 

S = {hi - h n 2 : K(h\) < K m , n m, K(h%) < K m ,„m} . 

Note that \S\ < 2 2K ">." m . Define the event S x as 

£i = {Vh n e S : \\{w d ) T Ah n \\ < h\\h n \\ 2 }. 

For any fixed h n , Ah 11 is an i.i.d. zero-mean Gaussian 
vector of length d and variance Assuming that 

II h n || 2 = 1 and applying Lemma 3, we obtain 

P (\(w d ) T Ah n \ >h)=P {\\w d \\ 2 G > h) 

= P (\\w% G > tx, \\w% > \fdo-{l + t 2 )) 

+ P (\\w d \\ 2 G>h, \\w d \\ 2 < Vda{l + t 2 )) 
<p(\\w d \\ 2 >Vda(l + t 2 )^ 

+ P (G > hiVdail + t 2 ))- lS 



< e 



-dtl/2 



g 2 CT ^d(l + t 2 ) 



(11) 



Hence, by the union bound and the fact that \S\ < 2 2k " 
[11], we have 



P{£i) < 2 2Km '" m ^e~ dt2 ^ 2 + e ^^u+i^ 
Note that 

\\A{e^-e\ l n )\\ 2 <a m UA)K n -el 



(12) 



For £3 > 0, define the event 82 as 



Inequality (17) involves a quadratic equation of ||A r 



E 



(n) 



= {<y ma x(A) < (1 + t 3 )Vd + 



It can be proved that [17] 

P(£ ( n) ' c ) <e~ dt ^ 2 . 



(13) 



But if cr raa x(^l) < (1 + t 3 )Vd+ y/n, then from (10) 

1, 



IK(e^-e^)|L< l + (l + t 3 )l 



2- m+1 n. 



Define the event as £^ n) = {V h n € S : \\Ah n \\% > 
(1 — i4)rf||/i ,l ||2}- By tne union bound and Lemma 2, it 
follows that 



(e^^ < 2 2Km ." m e^ (t4+log(1 " t4)) . 



(14) 



Define the event £4 as 



£[ n) 4 {V fc" e 5; < (1 + t B )d||& n ||§}, 

Again by the union bound and Lemma 2, it follows that 

p(f 4 n),c ) <2 2K -"™e-^ t5 - log ( 1 - t5 ». (15) 
Finally, for te > 0, define 

£ { 5 n) ={\\A T w d \\l<nd(l + t 6 )}. 

Given w d , A T w d is an n dimensional i.i.d. Gaussian random 
vector with variance | j 1 1 2 - Hence, by Lemma 2, 

P(pV||i>n 7 2 (l + i7)|lk d ||i = 7 2 ) 

< e -f («7-log(l+i 7 ))_ 

On the other hand, again by Lemma 2, 

P(\\w d \\ 2 2 < d(l - t 8 )) < el^+^sCi-ts)). 

Choosing te,t-j,t% > such that te < £7 and 1 + te = 
(1 - tg)(l + tr), it follows that 

P(\\A T iv d \\ 2 2 > nd{l+t 6 )) 
= P (|| A T w d \\ 2 2 > nd(l + t 6 ), \\w d \\ 2 2 > d(l - t 8 )) 
+ P (\\A T w% > nd(l + te), \\w d \\l < d(l - t 8 )) 

< e"f (*7-log(l+*7)) +e #(t8+log(l-t 8 ))_ ( 16 ) 

Combining (8) and (9) and the upper and lower bounds 
derived for the corresponding terms of (8) and (9), and 
choosing t\ = 2a^J d(\ + i2)(2ft mi „m), with probability 
P(£x n £ 2 n £ 3 n £4 n £5), the following inequality holds: 

(l-i 4 )Vd||A m ||l 

- 2 (VT+^2- m+1 VH((l + i 3 )Vd+ V^))) ||A m || 2 

- 2 (oVl + W 2k ™.» to ) ll A »J2 

- 2- m+1 VTT^n < 0. (17) 



Finding the roots of this quadratic equation, using y/1 + x < 
1 + x/2, and replacing m = [logn], we obtain 



2 < (773-y/ (2K m ,„logn)d 



(18) 




where 7l = y /T+^ (l + t 8 )(l - t*) -1 , 7 2 = VT +h(l - 
ti)-\ 73 = VT+hil-U)- 1 and 74 = % /TT^(l-t 4 )" 1 . 
On the other hand, by the union bound, 

p ((Ex n £ 2 n s 3 n £ 4 n £ 5 ) c ) = p(£i c u £ 2 C u £ 3 C u £% u £ 5 C ) 
< p(f {) + P(£ 2 C ) + P(^3 C ) + P(£ 4 C ) + P(^ c )- d9) 

Given d = 8rK m n m, choosing ti = i 4 = l/-\/r and 
fixing fx, t3, ts, . . . ,t% at appropriate fixed small numbers, 
(12), (13), (14), (15) and (16) guarantee that (19) goes to 
zero, as n — > oo. Moreover, for chosen parameters, 73 < 
Finally, for any e > 0, for n large enough, 
f ydF^-ji < e. This concludes the 

□ 

V. Related work 

The MCP algorithm proposed in [15] is mainly inspired 
by [18] and [19]. Consider the universal denoising problem 
where 9 is corrupted by additive white Gaussian noise as 
X n — 6 + Z n . The denoiser's goal is to recover from the 
noisy observation X n . The minimum Kolmogorov complexity 
estimation (MKCE) approach proposed in [18] suggests 
a denoiser that looks for the sequence 9 with minimum 
Kolmogorov complexity among all the vectors that are within 
some distance of the observation vector X n . [18] shows that 
if 6i are i.i.d., then under certain conditions, the average 
marginal conditional distribution of 61 given Xj tends to the 
actual posterior distribution of 6\ given X\. 

In [18], the authors consider the problem of recovering a 
low-complexity sequence from its linear measurements. Let 
S(ko) = {x n e [0, 1]" : K(x n ) < k }. Consider measuring 
G S(ko) using adxn binary matrix A. Let y d — Ax™. To 
recover x™ from measurements y d , [18] suggests finding x n 



as x n (y d , A) = 



arg mm 2 



K{x n ), and proves that 



if d> 2ko, then this algorithm is able to find x™ with high 
probability. Clearly assuming that a real-valued sequence has 
a low complexity is very restrictive, and hence S(ko) does 
not include any of the classes that has been studied in CS 
literature. For instance most of the one sparse signals have 
infinite Kolmogorov complexity, and hence the result of [18] 
does not imply useful information. 

In a recent and independent work, [20] and [21] consider a 
scheme similar to MCP. For a stationary and ergodic source, 
they propose an algorithm to approximate MCP. While the 
empirical results are promising, no theoretical guarantees are 
provided on either the performance of MCP or their final 
algorithm. 

The notion of sparsity has already been generalized in 
the literature in several different directions [3], [4], [9], 
[22]. More recently, [22] introduced the class of simple 
functions and atomic norm as a framework that unifies some 



of the above observations and extends them to some other 
signal classes. While all these models can be considered as 
subclasses of the general model considered in this paper, 
it is worth noting that even though the recovery approach 
proposed here is universal, given the incomputibility of 
Kolmogorov complexity, it is not useful for practical pur- 
poses. Finding practical algorithms with provable perfor- 
mance guarantees is left for future research. 

In this paper, we have focused on deterministic signal 
models. For the case of random signals, [23] considers the 
problem of recovering a memoryless process from its under- 
sampled linear measurements and establishes a connection 
between the required number of measurements and the Renyi 
entropy of the source. Also, our work is in the same spirit 
as the minimum entropy decoder proposed by Csiszar in 
[24], which is a universal decoder, for reconstructing an i.i.d. 
signal from its linear measurements. 

VI. Conclusion 

In this paper, we have considered the problem of recov- 
ering structured signals from their random linear measure- 
ments. We have investigated the minimum complexity pursuit 
(MCP) scheme. Our results confirm that if the Kolmogorov 
complexity of the signal is upper bounded by k, then MCP 
recovers the signal accurately from 0(k log n) random linear 
measurements, which is much smaller than the ambient 
dimension. In this paper, we have specifically proved that 
MCP is stable, such that the £2 -norm of the reconstruction 
error is proportional to the standard deviation of the noise. 

Appendix A 

The following two lemmas are frequently used in our 
proofs. 

Lemma 2 (xsquare concentration): Fix r > 0, and let 
Z t ~ Af(0, 1), i = 1, 2, . . . , d. Then, P(£jLi Zf < d(l - 
r)) < e #< T+u *< 1 - r », and 

i=l 

The proof of Lemma 2 is presented in [15]. 

Lemma 3: Let X n and Y n denote two independent Gaus- 
sian random vectors with i.i.d. elements. Further assume that 
for i = 1, . . . , n, Xi ~ Af(0, 1) and Y t ~ Af(0, 1). Then the 
distribution of (X n ) T y n = Yl7=i XiYi is the same as the 
distribution of ||X"|| 2 G, where G ~ 7V(0, 1) is independent 
of ||* rl || 2 . 

Proof: We need to show that {X n ) T Y n /\\X n \\ 2 is dis- 
tributed as A/"(0, 1) and is independent of ||X n ||2. To prove 
the first claim, note that 



(X n ) T Y n 
On the other hand, given X 



-Y. 



E 



\x r 



7ll^"ll2 = a' : 
Y^Af (0,1), 



(A-l) 



because Y^i=i a i = ^ Therefore, since the distribution of 
E"=i p% F * § iven X n /\\X n \\ 2 = a" is independent of 

a", 



1=1 



\X' 



To prove the independence, note that X™/||X™||2 and Y n 
are both independent of ||X n j| 2 . 

□ 
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